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I. INTRODUCTION 


Dengue is a disease at a global level with disastrous 
consequences for health, especially in tropical and 
subtropical countries where Aedes aegypti has found easy 
adaptation. Its adaptability to new environments has caused 
the mosquito that transmits dengue to spread across nearly 
the entire globe [1]. With climate change, this vector is 
trending toward regions that previously had no problems 
with this insect. With the spread of the mosquito and the 
increase in the threat of the disease, authorities linked to the 
Public Administration must act preventively to control and 
combat dengue [2]. Thus, this research aims to present a 
predictive model for the occurrence of dengue with a 
forecast of up to 10 weeks so that public managers can plan 
and develop actions to face possible disease outbreaks in 
cities. The predictive model used meteorological and 
dengue data from January 2010 to August 2020. These data 
were organized in an Excel spreadsheet, and the association 
and classification rules were extracted using the CBA 
software. Morrinhos, located in the interior of Brazil and 
constantly facing problems with dengue, was chosen as a 
model. 
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Abstract - Dengue has become a global concern due to the number of 
populations threatened by the disease, especially in countries with 
tropical climates, and the ease of adaptation of the transmitting mosquito 
to new environments. Thus, the objective of this work is to demonstrate a 
predictive model based on meteorological and dengue data mining 
organized in an Excel spreadsheet. Organized data allow for the 
generation of association and classification rules through the CBA 
software so that public administrations can plan preventive actions to 
control disease outbreaks. Based on the association and classification 
rules, it was possible to prepare a forecast of the occurrence of dengue 
of up to 10 weeks, as proposed in the research. 


Il. THEORETICAL REFERENCES 


2.1. Dengue: adaptation and advances of the disease in the 
world 

Dengue has recently become one of the most 
epidemiological diseases with global relevance, 
transforming it into a worldwide public health problem, 
with outbreaks occurring in Latin America every three or 
five years since the 17th century [3]. Climate change has 
contributed to the geographic distribution of Aedes 
aegypti on a global scale, including in areas where there 
were no reports of the mosquito before, as is the case in 
southern Buenos Aires Province and Eastern Patagonia (low 
temperatures, 16.5° to 11° C) [4]. In addition, climate 
change may affect the range of Aedes aegypti within the 
United States of America in areas with a temperate climate, 
which indicates that this mosquito has the potential to 
spread throughout North America [5]. 


The adaptability of the mosquito that transmits dengue 
is related to its  ecophysiological (genetic and 
environmental) plasticity—that is, the ability of a species to 
survive in different habitats. The more ecophysiological 
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plasticity the Aedes aegypti displays in environments of 
extreme temperatures (cold winters or periods of heat), the 
greater the distribution limits of this vector, which increases 
the risk of the disease arising [1]. 


The spread of Aedes aegyptiis influenced by the 
population's susceptibility and exacerbated by human 
migration, whether within the country or between countries, 
including over large distances [6]. An infected female 
mosquito can transmit the virus to a population throughout 
her life [7]. Moreover, at the end of the 20th century, dengue 
spread throughout tropical countries. It threatened a third of 
the world's population, even more so because Aedes 
aegypti can transmit four different viruses (serotypes). 


According to serotypes, the first dengue epidemic 
confirmed in the laboratory occurred in Brazil in 1981 and 
1982 in Boa Vista, the capital of Roraima (serotypes 
DENV-1 and DENV-4). In 1990, the DENV-2 serotype was 
detected in Rio de Janeiro, while the DENV-3 serotype 
appeared in 1999 in Amapá, Pará, Roraima, and the 
Tocantins [8]. However, the first dengue epidemic in Brazil 
was recorded in 1845 in Rio de Janeiro. Brazil is responsible 
for more than 50% of dengue cases in the Americas. The 
Central-West region (formed by Goiás, Mato Grosso, and 
Mato Grosso do Sul) has the highest dengue cases per capita 
[9]. 

Many Brazilian cities need running water or garbage 
collection services. Therefore, they can provide conditions 
for mosquito breeding in waste thrown in backyards and 
water stored for consumption [10]. The mosquito's behavior 
is anthropophilic, and it prefers to lay eggs in containers 
with water (artificial pools), which favors its adaptation to 
the urban environment. 


As evaluated by these same authors, the mosquito that 
transmits dengue can be introduced into ecosystems where 
it did not exist. The life cycle of Aedes aegypti spans four 
phases: egg, larva, pupa, and adult. The females deposit the 
eggs on the edge of water containers, and embryonic 
development takes an average of 48 hours. However, the 
eggs can subsist for more than a year without water. 
Humans can transport it in containers for long distances 
(and, in some cases, even by animals) [11]. 


Dengue viruses are transmitted to humans through a 
contaminated female mosquito bite. The disease can present 
symptoms from mild fever to more severe cases, such as 
hemorrhage and shock. The transmission from mosquito to 
man and man to mosquito is called horizontal transmission. 
However, recent studies also consider vertical transmission 
(or transovarial transmission), in which female mosquitoes 
transfer the virus to their offspring [12]. 


The seasonality of the occurrence of dengue in the 
hottest months of the year is related to the dynamics of the 
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reproductive cycle of Aedes aegypti. However, even when 
cases decrease in the colder months, it is not enough to stop 
transmission [13]. Since 1998, Brazil has made it mandatory 
to record dengue cases in a national computerized system 
created in 1993 (SINAN - Information System for 
Notifiable Diseases). All municipalities, states, and the 
Federal District must report disease cases [14]. 


SINAN intends to record the occurrence of outbreaks 
and epidemics, measure the magnitude of epidemiological 
diseases (including dengue), and use it as a tool in public 
health planning in Brazil [14]. In the late 2000s, due to the 
increase in dengue cases in Brazil, new challenges were 
created in the disease control and prevention systems. 
Health resources should be adequate to achieve the expected 
effectiveness and efficiency [15]. 


In the state of Goiás, according to research by Santos et 
al. [15], there is much difficulty among municipal 
coordinators of Epidemiological Surveillance to prevent the 
most severe cases of dengue and, in addition, most do not 
know if their municipality has a contingency plan against 
the dengue epidemic. In Goiânia (capital of the state of 
Goiás), as of 2008, dengue epidemics were observed for 
three consecutive years, and in 2013 the worst occurred with 
58,024 confirmed cases, of which 89.5% occurred in the 
interval between December 30, 2012, and June 29, 2013 
[16]. 


It is recommended to use mathematical and statistical 
models for predicting dengue outbreaks to determine the 
realistic nature of the disease. According to Nascimento et 
al. [16], monitoring severe disease cases through 
information integration tools becomes an essential strategy 
in controlling dengue epidemics to reduce lethality, 
allowing adequate decision-making for the planning and 
organization of health services. Erandi et al. [17] developed 
a predictive model of dengue for the city of Colombo (Sri 
Lanka) based on the classic compartmental model that was 
reduced to a simpler quasi-equilibrium Infection/Recovery 
model to try to understand the dynamics of disease 
transmission. The development of more promising 
predictive models for dengue must consider the external 
variables. Of these, the ones that presented the best results 
were climatic factors such as precipitation, temperature, and 
humidity, but it was impossible to use these variables in 
their studies [17]. 


2.2. Public administration and dengue 


Improvements cannot match the expectation that 
developing countries' urban growth rates will double by 
2050. Furthermore, the increase in arbovirus cases was 
practically due to urbanization, environmental 
deterioration, poverty, and social inequality. When the 


Public Administration makes the interventions without the 
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direct participation of the community, the effect will not 
have the desired impact, nor will it be a sustainable action. 
A positive example of community intervention was 
recorded on the Kenyan coast, where environmental 
education and cleaning campaigns contributed to reducing 
malaria and diseases caused by Aedes. Improving housing 
that favors a more sustainably built environment also helps 
fight vector-borne diseases [2]. 


Pandemics (and their consequences) should lead cities 
to review their concepts of how they provide services and 
rethink how they plan their space. When thinking about the 
city of the future, which is expected to be more ecologically 
and economically resilient with healthier and more livable 
neighborhoods (available and accessible to all people), 
improvement in the quality of life of local communities 
should arise from Public Administration that promotes 
inclusion, equity, urban accessibility, and sustainability, in 
addition to community-based services. In other words, 
when Public Administration begins to understand the need 
for change, it should develop integrated planning for public 
services based on the community and implement solutions 
that connect local governments, associations, communities, 
and the population [18]. 


The ability to promote the safety of essential household 
utilities, such as energy, water, and accessibility to health 
infrastructure, is directly related to Public Administration 
interventions in health policies and how vulnerable families 
bear the disease burden. This is mainly because inequality 
in access to public policy resources (public services, 
medical care, Etc.) and health intervention policies make it 
more difficult for the low-income population to protect 
themselves from pandemics [19]. It is necessary to create 
strategies to mitigate the effects of a pandemic on low- 
income families (vulnerable population), given that 
economic vulnerability, difficulty in accessing public 
policies, and low spending capacity on health can interfere 
with the city's ability to react with policies to deal with 
disease outbreaks. 


HMI. RESEARCH METHOD 


Initially, it was necessary to carry out a bibliographic 
survey on dengue and the importance (or need) of seeking a 
way to collaborate with public health in the fight against this 
disease by reading several articles that address the subject, 
especially in the Americas and in Brazil. 


The development of research involving data mining 
generally presupposes the establishment of three phases: a) 
the data collection, b) data preparation, and c) modeling 
[20]. 
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3.1. Data collection 


In data mining, the collection phase may require 
specialized hardware or manual work in the search for 
documents on the Web. This research phase works with 
meteorological and dengue data in Morrinhos, state of 
Goiás, Brazil, from January 2010 to August 2020. The 
meteorological data were obtained from the National 
Institute of Meteorology (INMET), linked to the Ministry of 
Agriculture, Livestock and Supply. INMET has a 
meteorological station in Morrinhos, in Goiás, latitude - 
17.745066, longitude -49.101698, altitude 751.09 m, 
installation date: May 24, 2001, code from station: A003. 


Ideal climatic conditions, such as temperature, rainfall 
intensity, and other climatic events, can affect the 
reproduction and survival of Aedes aegypti and influence 
the rate of human morbidity [21]. As for dengue data, they 
were obtained from the Notifiable Diseases Information 
System (SINAN) by the Dengue Notification/Investigation 
Bulletin - Frequency per Epidemiological Week 
Notification and Classification (Dengue) and Frequency per 
Epidemiological Week Notification and Evolution (Deaths) 
in the Integrated Monitoring System Aedes Zero of the State 
Department of Health of the State of Goiás. 


The Federal Government defines the epidemiological 
weeks of each year on the SINAN website (available at 
http://www. portalsinan.saude.gov.br/calendario- 
epidemiologico), and the Epidemiological Surveillance 
Centers of the municipalities, states, and the Federal District 
provides the records of notifications and investigations 
based on the Epidemiological Calendar. The intention of 
collecting meteorological and dengue data is to verify, by 
cross-analyzing these data, whether there is a pattern in the 
occurrence of disease cases concerning climatological data. 


Meteorological data comprise essential variables that 
affect the behavior of the Aedes aegypti mosquito, such as 
temperature, relative humidity, precipitation, drought, 
atmospheric pressure, wind speed, and solar radiation, 
among others, according to Erandi et al. [17]. However, the 
data on dengue is weekly (epidemiological week), and the 
meteorological data are daily. It was necessary to calculate 
the weekly average of the meteorological data, following 
the same seven-day periods, except for the variable 
"precipitation" since it is essential to know the total volume 
of rainfall each week and not the average. 


3.2. Data Preparation 


The second phase, data pre-processing, is the most 
crucial part of data mining. Still, it is sometimes given little 
importance, as the focus is usually on the analytical aspects 
of this method [20]. However, as Aggarwal [20] points out, 
data preparation begins soon after data collection and 
consists of the following: 
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a) resource extraction - the researcher, can extract the 
essential characteristics of the research data for a specific 
application; 


b) data cleaning — aims to eliminate unnecessary 
records and work on missing entries and, if necessary, 
eliminate inconsistencies; and 


c) feature selection and transformation — taking out 
worthless features or transforming current features into a 
new range of data more accessible for analysis. 


Regarding the dengue data obtained from SINAN, it 
was possible to create a new variable, in a row with dengue, 
to verify the existence of the correlation between 
accumulated rainfall and cases of the disease. New variables 
were created and added from the meteorological 
information existing in the INMET data, such as 
precipitation accumulated in two weeks, four weeks, eight 
weeks, and ten weeks, and also weeks followed by rain, 
since the saturation of the soil by rainwater tends to increase 
the number of breeding sites for Aedes aegypti [11]. 


3.3. Modeling 


Four aspects are essential in data mining (grouping, 
classification, association, and outlier detection) for 
scientists to understand the nature of the relationships 
between data. It is considered, then, a multidimensional 
database "D"; with "n" records and "d" attributes. This 
database can be represented by the matrix "D n x d"; each 
row refers to a record, and each column relates to a 
dimension. The relationships between data items can be 
"relationships between columns" — it establishes positive or 
negative association, used to predict the value of a column, 
and is known as ‘data classification’; or "relationship 
between lines" — favors the identification of clusters and 
anomalies (outlier analysis) [20]. 


In this research, data mining involves a bank of 
variables with 572 lines and 26 columns organized and 
worked on in an Excel spreadsheet, considering 
meteorological and dengue data. The data mining technique 
from meteorological and dengue data from Morrinhos, state 
of Goiás, Brazil, prepared in an Excel spreadsheet, proposes 
a multivariate analysis and the generation of rules for 
association and classification of data through the CBA 
software (quantitative model). Association rules with high 
support and confidence can help predict dengue outbreaks 
and are of the "IF (X), THEN (Y)" type; that is, it is an easy- 
to-understand model when relating cause and effect (IF 
/THEN) [22]. Such rules can facilitate prediction and are 
considered classification rules. 


Consider "A" and "B" as variables in this case. Rule A 
=> B is satisfactory in 'support' if the level of 'confidence' 
satisfies concomitantly: a) the support of the set of variables 
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"A" is at least 's' and b) the confidence of A => B is in the 
minimum 'c' [20]. The present research intends to present 
the forecast of dengue cases of up to 10 (ten) weeks so that 
those responsible for the city's health can plan and anticipate 
actions so that disease outbreaks can be prevented. 


IV. RESULTS AND DISCUSSION 


The creation of a predictive model presupposes the 
analysis of several variables that maintain a direct (or, 
sometimes, indirect) correlation with the phenomenon 
studied. In the case of small cities, where the absolute 
numbers of the variables are low, it is proposed to look at 
the variables in relative numbers [17]. In addition, a 
predictive model that can predict the occurrence of a 
pandemic (or even an outbreak) significantly corresponds 
with the management of the city, particularly with the 
authorities linked to the area of Health [14 - 15]. 


In the case of Morrinhos, with an estimated population 
of 46,955 [23], the absolute number of dengue cases and 
deaths from the disease in the period under analysis (2010 
to 2020) is low. As the contamination potential of the 
transmitting mosquito is high, it becomes a concern for the 
local community [6 - 21]. Thus, the predictive model 
considered the calculation of the morbidity level of the 
disease in Morrinhos (GO) and separated the data into 
tertiles (three parts) - low, medium, and high, built from 
classification rules through the CBA software [11 - 21]. 


When processing the classification rules, the software 
checks the entire data set (weather variables, disease 
morbidity, and death variables) until it finds many highly 
supportive rules at the intersection of input and output 
variables. Subsequently, in the model validation phase, the 
algorithm builds a set of rules to elaborate the disease 
occurrence level [11 - 20 - 21]. 


The software confirms the classification rules in a 
confusion matrix in which the main diagonal demonstrates 
the correct model classification while the incorrect 
classification appears outside this diagonal [22]. Fig. 1 
illustrates the confusion matrix for the first week (s1) of 
future disease prediction for Morrinhos (GO). 


Overall Error 15.25% 
Confusion Matrix On Training: 


(1) (2) (3) <--- Classified As 


271 8 (1) 


1): dengue_sil < 1 
21 13 (2): 1 < dengue s1 < 4 
3): 


4 161 ( 


): dengue_sl > 4 


Fig. 1: Confusion matrix to predict disease occurrence in 
Week I (s1). 
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The first line refers to the first tertile (dengue cases less 
than 1), the second line to the second tertile (dengue cases 
between 1 and 4), and the third line to the third tertile 
(dengue cases more significant than 4), related to the 
epidemiological weeks from 2010 to 2020 entered into the 
model. It is observed that the model's accuracy for Week 1 
(the first week of the predictive model) is 84.75% (that is, 
100% - 15.25% of general error). To better understand how 
the software calculates accuracy, add the values of the main 
matrix (271 + 35 + 161 = 467) and divide by the total 
number of samples (551) and multiply by 100 to find the 
percentage (467 + 551 x 100 = 84.75%) [22]. 


The predictive model was applied for the ten coming 
weeks with the accuracies shown in Fig. 2. Ideally, the 
accuracy should be 100% or present a value close to this 
maximum accuracy [20]. However, given the low absolute 
numbers of dengue in Morrinhos, the comparative data are 
significant due to the concern about the spread of the disease 
[13 - 16]. 


Model Accuracy 


S—¢ 


== Model Accuracy 


Fig. 2: Model accuracy — weeks: s1 to s10. 


It should be clarified that the CBA software generated 
the following number of classification rules for each week 
of the prediction: s1 — 108; s2 - 91; s3 - 92; s4 - 95; s5 - 81; 
s6 - 82; s7 - 97; s8 - 103; s9 — 98 and s10 — 95. Moreover, it 
was possible to analyze those that presented 100% 
reliability from the rules generated in the classification. 
Thus, attention was drawn to the model that it did not relate 
the meteorological variables 
‘temperature’ with the occurrence of dengue cases, as in 
Rule 5 of Week 2 (s2) and Rule 8 of Week 7 (s7) (and the 
pattern was also repeated in the other weeks): 


‘precipitation’ and 
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Rule 5: 


sem_com_ dengue <_0=Y 


precip_acum_s+8_>_310=Y 


temp_med_>_25=Y 

-> Class = dengue_s2_<_1 

(7.273% 100.000% 40 40 7.273%) 
Rule 8: 


temp_min_>_19=Y 


sem_com_chuva_>_5=Y 


temp_med_>_25=Y 
rad_ sol >_20=Y 
-> Class = dengue_s7_<_1 


(5.321% 100.000% 29 29 5.321%) 


As can be seen in rule 5 (s2), with support of the rule at 
7.273% and reliability of 100%, even with rains 
accumulated in 8 consecutive weeks with more than 310 
mm and temperature above 25° C, the fact of not having 
dengue at present, the model indicates that cases of the 
disease in the second following week will be less than 1 
(i.e., none). The same occurs in rule 8 (s7), with rule support 
at 5.321% and 100% reliability, minimum temperature 
above 19° C, with more than five weeks of accumulated 
rainfall, average temperature above 25° C, and solar 
radiation more significant than 20, the model does not 
predict dengue cases for the next seven weeks. 


On the other hand, in the rules where dengue cases 
appeared in the input variables, the model presents a dengue 
forecast for the following weeks, regardless of climatic 
factors, as shown in rule 3 (s2), rule support at 8.727 % and 
100% reliability. Atmospheric pressure is more significant 
than 933 hPa, and rainfall accumulated in the last ten weeks 
is less than 85 mm. Still, with more than six consecutive 
weeks of dengue cases, the model predicts that for the 
second week (s2), the occurrence of the disease is more 
significant than four cases (dengue_s2_>_4). 


Rule 1 (s8) supports the 7.537% rule and 100% 
reliability, even with accumulated rainfall of less than 65 
mm in the last eight weeks and minimum temperature below 
15° C, the fact of having more than six weeks in a row with 
cases of dengue, the model predicts that dengue will occur 
in the eighth week (s8) of more than four cases 
(dengue_s8_>_4). 
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Rule 3: 


p_atm_max_>_933 =Y 


precip_acum_s+10_<_85=Y 
sem_com_dengue_>_6=Y 
-> Class = dengue_s2_>_4 
(8.727% 100.000% 48 48 8.727%) 
Rule 1: 
precip_acum_s+8_<_65=Y 
sem_com_dengue_>_6 = Y| 
temp_min_<_15=Y 
-> Class = dengue_s8_>_4 


(7.537% 100.000% 41 41 7.537%) 


However, it should be noted that a milder temperature 
does not impede the proliferation of the arbovirus, given the 
vector's ability to adapt to new environments and 
temperatures [4]. 


V. FINAL CONSIDERATIONS 


Future prediction models for the occurrence of diseases 
aim to collaborate with city managers to organize resources, 
plan actions, and effectively control endemic diseases so 
that the impact on the local community is as small as 
possible [5 - 3]. According to the results found in the 
classification rules to develop a predictive model for dengue 
in Morrinhos, it was proved that the disease occurrence is 
more related to existing cases in the city than to 
meteorological issues, such as precipitation and 
temperature. 


Although the meteorological data show little rain in 
recent weeks and low temperatures (rule 1, s8), the rules 
generated by the model indicate that even with prolonged 
periods of rain and high temperatures. It is not enough to 
determine that a dengue epidemic, as shown in the results of 
rules 5 (s2) and 8 (s7), had a similar pattern in the other 
weeks. Likewise, rules 3 (s2) and 1 (s8) present dengue 
cases with more than four occurrences in the respective 
weeks. The disease has been present for more than six 
weeks in both cases. 


The state government provides monthly mosquito 
incidence data. However, this is data that contributes little 
to a predictive model on a weekly scale. The lack of daily 
or weekly data from the city hall on the incidence of the 
mosquito that transmits dengue in Morrinhos, mapped by 
street and neighborhood, makes it challenging to create a 
predictive model. As data are collected daily by agents 
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fighting endemic diseases, they should also be made 
available in daily data or, at most, per epidemiological 
week, which would increase the efficiency and 
effectiveness of the predictive model. 


VI. RECOMMENDATIONS FOR FUTURE 
STUDIES 


It is known that transmission occurs when Aedes 
aegypti bites an individual with dengue and then bites 
another healthy individual(s). In this case, the more 
mosquitoes there are in a given region, the greater the 
chances of an epidemic [1 -12]. Because of the results found 
in this research, it would be interesting to study the levels of 
infestation of the Aedes aegypti mosquito in cities. The 
model can better relate meteorological issues (precipitation, 
temperature, relative humidity, Etc.) to the proliferation of 
this dengue vector. 


The risk of having Aedes aegypti in the city is already a 
cause for concern since if any infected individual arrives, it 
could promote an explosion of cases [8 - 10]. Faced with 
this possibility, a new predictive study would be healthy and 
necessary, one that would seek to relate the meteorological 
variables with the infestation variables of the mosquito that 
transmits dengue. 


In addition, studies indicate that this arbovirus is also 
related to the incidence of greater poverty [7 - 9], which 
deserves a predictive model allowing Public Administration 
to follow up with vulnerable families in advance. 
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